Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(dcutr): handle empty holepunch_candidates #5583

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

stormshield-frb
Copy link
Contributor

@stormshield-frb stormshield-frb commented Aug 30, 2024

Description

A few months ago, we were experiencing from time to time some weird failures with DCUTr. After some research to find out the problem, it was a race condition : sometimes identify must be a little bit slow and the DCUTr handler is created before an identify event is received. Alone, this is not necessarily a problem. But if this race happens when the holepunch candidates list of DCUTr is empty, then DCUTr will always fail for this connection. Indeed, when receiving an new relayed established connection, DCUTr will create an Handler for this connection which will be responsible to make the hole-punching. However, the candidates that are used are the one known at Handler instantiation, so any future updates about NewExternalAddrCandidate will not be forwarded to the Handlers.

This PR is the upstream of the fix we did several months ago and we did not encountered any particular problem with it since.

Timetable of the problem

Peer 1 Time Peer 2
Connected to relay 0.416
Receive identify from relay 0.420
0.646 Connected to relay
0.650 Dial Peer 1 through relay
0.655 Connected to Peer1 through relay
0.655 Create DCUTr handler with no hole-punch candidates
Connected to Peer 2 through relay 0.657
0.659 Receive identify from relay (first observed_addr)
DCUTr fail OutboundError(NoAddress) 0.663
0.664 DCUTr fail InboundError(UnexpectedEof)

Notes & open questions

  1. I have put some TODOs about the potential merging of self.attempts += 1. When inbound, self.attempts is incremented when starting an handshake, however, when outbound, self.attempts is incremented at the "new outbound substream" request. Before I don't think it was a problem, but now that we do not necessarily trigger an handshake if there is no hole-punch candidates, I think we might was to increment self.attempts only when effectively starting the handshake. What do you think ?

  2. It is noted in the log when starting a new handshake that, if the corresponding stream (inbound_stream or outbound_stream) was not empty, then we replace the handshake. There is warn level log stating New inbound/outbound connect stream while still upgrading previous one. Replacing previous with new. However, when reading the code of the FuturesSet::try_push method and then the FuturesMap::try_push method (which is used inside), the future pushed never replaces any old one when capacity is reached, it just returns an error. So what do you think should be done ? Should be actually replace the old with the new like the log says ? Or should we not replace the old with the new and update the log to say that the new one was dropped ?

Change checklist

  • I have performed a self-review of my own code
  • I have made corresponding changes to the documentation
  • I have added tests that prove my fix is effective or that my feature works
  • A changelog entry has been made in the appropriate crates

@dariusc93 dariusc93 requested a review from jxs August 30, 2024 18:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants